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Manipulation of multimedia data in multimedia databases is not straightfor- 
ward as in conventional databases because of the complex structure of the 
multimedia data such as image, or sound. The issue in the retrieval of mul- 
timedia data from the database is the matching of the contents of multime- 
dia data to a user query. The common solution is to use either keywords or 
natural language descriptions to describe both the contents of multimedia 
data and user queries. The major problem is that different users, or the 
same user at different times, describe the same thing differently which re- 
sults in the descriptions of the contents of multimedia data to rarely exactly 
match the descriptions of the user queries. Hence, partial or approximate 
match between descriptions of multimedia data and user queries is often 
required during multimedia data retrieval. In this paper, we propose an in- 
telligent approach to approximate match by integrating both object-oriented 
and natural language understanding techniques. 

1. Introduction 

A multimedia database management system supports the management of multimedia 
data, which includes image and sound among others, in addition to supporting the con- 
ventional databases. Multimedia systems are currently gaining a lot of attention because 
technology today has made it possible to capture and store in computers multimedia data. 
Multimedia data broadens the communication between the computer system and the 
user. Many applications like military, publishing, or instructional routinely need multimedia 
data. Although the cost of the hardware required to handle multimedia data is decreasing 
rapidly, the software needed to manage such multimedia data is lacking or inefficient. 

[WOELK86] identified two types of requirements which multimedia applications im- 
pose on a database system. One requirement is for a data model that allows a natural and 
flexible definition as well as the evolution of the schema that can represent the composi- 
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tion of and the complex relationships among parts of a multimedia data. Our earlier work 
[MEYER88, LUM89] adopted the extended relational model as the basis for addressing 
all data modelling requirements. The other requirement is for the sharing and manipula- 
tion of multimedia data. In this paper, we will emphasize on the sharing and manipulation 
of multimedia data for multimedia applications. 

The focus of this paper is the manipulation of multimedia data in a multimedia data- 
base system. In particular, we will describe in detail an efficient method for the retrieval of 
multimedia data by way of inexact matching. In conventional databases, retrieval of stan- 
dard numerical and alphanumeric data is handled by utilizing the content oi the data. The 
fundamental problem that one must face in the context of a multimedia database is the 
question of howto handle content search. There is no easy solution because it is difficult 
to find the appropriate data conveniently and efficiently based on the contents of the mul- 
timedia data because they are intrinsically rich in semantics. 

In developing an efficient retrieval method for multimedia data, we concluded that it is 
not possible to utilize the content directly. This is a fair conclusion since the content of a 
multimedia data is mostly unstructured complex data like an image or a sound. As in most 
other systems, we follow the approach of content based search by means of verbal de- 
scriptions on the contents of multimedia data. A well known keyword approach to content 
description is not suitable because it has been known to be imprecise and the users often 
have difficulty in focusing the search to data of interest. Hence, we adopt the natural lan- 
guage approach to content description as a more viable option. 

The methodology we adopt consists of associating a natural language description to 
each multimedia data and using the description to retrieve the relevant data. More pre- 
cisely, the description of a multimedia data is matched against the description of a user 
query which is also expressed using a natural language. The major problem with this ap- 
proach is that it is often the case that the description of a multimedia data does not exactly 
match the description of a user query. The reason is that it is difficult for different users, or 
the same user at different times, to describe the same thing identically because they can 
use synonyms or generalize/specialize categories belonging to the domain of interest and 
so on. Hence, the key to efficient retrieval process is to automatically perform partial or 
approximate match of the description of a multimedia data to the description of a user que- 
ry whenever exact match is not possible. In this paper, we propose an intelligent approach 
to approximate matching by integrating object-oriented and natural language understand- 
ing techniques. 

This paper makes two contributions. The major contribution is the formulation of a gen- 
eral scheme to retrieve data that comprises a variety of multimedia data stored in a data- 
base with special emphasis on approximate match. As far as we know, very little research 
on partial or approximate matching has been conducted in natural language processing. 
The retrieval method may also be adopted easily into the field of intelligent information re- 
trieval. The second related but less significant contribution is to support the claim that ob- 
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ject-oriented technology can be adopted and applied easily to multimedia systems 
application. 

The paper is organized as follows. Section 2 discusses related work. Section 3 ad- 
dresses fundamental problems and outlines various components to multimedia data re- 
trieval in a multimedia database system. Section 4 describes in detail our approximate 
match algorithm during retrieval of multimedia data. The summary is given in Section 5. 

2. Related Work 

Several multimedia projects have been undertaken by various researchers in both the 
academia and industry over the past several years. The MINOS system [CHRIST86] de- 
veloped by a team at the University of Toronto manages highly structured multimedia ob- 
jects that consist of attributes as well as the text, image and voice part. Sophisticated 
browsing and user interface features allow browsing of the schema as well as synchro- 
nized updates. The MCC Database program [WOELK86,87] also undertook several mul- 
timedia projects by establishing the database requirements of multimedia applications. 
They identified requirements for a data model and for the sharing and manipulation of mul- 
timedia data. Hypertext has also been extended to manage image and sound as well. One 
notable outcome is the INTERMEDIA system [YANKE88] developed at Brown University. 
[MASU87] has developed a framework to classify and compare the different projects. 

In [LWH87] the argument is made that storing multimedia data is one thing but orga- 
nizing a large amount of them for efficient search and retrieval is quite another. The fun- 
damental difficulty in the retrieval of multimedia data lies in the problem of handling the 
rich semantics that is contained in the data. In [LUM89], we introduced the approach of 
contents based search by means of natural language descriptions that form a part of a 
multimedia data. This approach is related to the research on intelligent information retriev- 
al (IR). 

Intelligent IR deals with the overlap of research in artificial intelligence (Al) and IR. 
Starting several years ago, researchers in information retrieval (IR) became interested in 
direct access to tabular and textual data. At the same time, researchers in natural lan- 
guage processing are at a point where it can contribute to intelligent IR [SCHA81]. In par- 
ticular, research in the organization of conceptual memory [DEJON79,KOLO80] may be 
appropriate to the design of intelligent fact retrieval systems. 

A large part of IR research has dealt with techniques for manual and automatic index- 
ing, which is the process of document and request representation. Much of these research 
is summarized in [SALT74,MARON79]. Also, a strong theoretical framework has been 
built up in IR around the probabilistic model of retrieval [SALT83,RIJS86j. To overcome 
the retrieval problem in IR, this model uses pure statistical measures based on keyword 
frequencies. Finally, clustering research in IR has focused on generating relationships be- 
tween documents and retrieving groups of documents and has emphasized on solutions 
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to the best-match problem [SHAS90], also known as the nearest-neighbor problem 
[SMEA81], or the closest point problem [SHAM75]. 

There had been early interest of Al techniques in the domain of IR [SPAR78.SMIT80]. 
The IRUS system [BATES83] is more representative of modern attempts which is de- 
signed for processing heterogeneous data bases through natural language queries. The 
RUBRIC system [TONG87] is a production rule-based IR system in which the indexing 
base of the system contains positional information about words in the texts, which allow 
positional controls on words while processing queries. The I^R system [CROF87] pro- 
vides assistance to users at all stages of the retrieval process and consists of a set of ex- 
pert systems managed by a scheduler. Last but not least, the IOTA system [CHIA87] tries 
to improve the qualitative performance of IR systems in replacing keywords by noun 
groups involving extensive semantics. 

The approach we propose is somewhat different from the intelligent IR systems men- 
tioned. It is clear that most of the work in these systems is mainly concerned with natural 
language processing, particularly query processing, and deductive capabilities based on 
extended semantic model of document content and sometimes from the user. Our ap- 
proach also shares these characteristics. However, the concept of matching function be- 
tween system concepts and user concepts is based on exact matching in many systems 
while our approach is based on approximate matching. Even in systems with approximate 
matching capabilities, the matching function used are primitive or superficial at best com- 
pared to our approach which utilizes object-oriented technology to improve the quality of 
the matching process. 

3. Architecture of Multimedia Data Retrieval System 

In this section, we outline the architecture of the intelligent retrieval system for a mul- 
timedia database system. The architecture consists of the various components of the mul- 
timedia data retrieval system. Before we continue, definitions and various issues 
associated with intelligent retrieval of multimedia data are addressed. 

3.1 Definitions and Background 

As mentioned before, multimedia data, in the broadest sense, consists of unformatted 
data such as text, image, voice, signals, etc. in addition to alphanumeric data. We define 
a multimedia database management system (MDBMS) as a system that manages all mul- 
timedia data and provide mechanisms to handle concurrency, consistency, and recovery 
in addition to providing a query language and query processing. 

Despite differences in data model and implementation aspects, all research projects 
on MDBMS have decided to organize multimedia data using abstract data type (ADT) 
concept. This is generally accepted as the adequate approach. However, none of the 
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projects have addressed the problem of content description and retrieval of multimedia 
data. 

The fundamental difficulty in handling multimedia data is intrinsically tied to a very rich 
semantics. To illustrate such a difficulty, let us look at an image of ships. Given such a pic- 
ture, how are we to know what type of ships are in the picture. In other words, are the ships 
destroyers, cruisers, submarines or passenger ships? As another example, let us sup- 
pose that there is a picture of a dog and a cat. How do we know if they are chasing each 
other or playing? 

To answer queries posed on images, for example, a person must draw from a very rich 
experience encountered in life to derive at a good answer. One must have a sophisticated 
technique to analyze the contents of the images to get the semantics of different things in 
the images. Technology today is not advanced enough to expect systems to have this kind 
of capability to answer multimedia query. However, we can use both Al and IR technology 
to do the next best thing. We can abstract the contents of multimedia data into words or 
text and use the text description equivalent of the original multimedia data to match the 
user request or query. This is the principle we will use in designing a MDBMS to handle 
multimedia data for different applications. Figure 1 shows the format of a multimedia data 
which consists of the registration, raw and description data. 



IMAGE 



C 

C 

c 



Registration Data (Height, Width, Depth, Colormap 



Raw Data (Matrix of Pixels in Raster/Bitmap Format) 



Description Data (abstracted content of image using text) 



. etc.) ^ 

) 



Figure 1 : Multimedia Data Format 



Raw data is the bit string representation of the image, sound, signal, etc. obtained from 
scanning or digitizing the original multimedia data. Registration data generally enhances 
the information about raw data and is not redundant. The contents of a multimedia data is 
described by description data. Description data cannot be automatically derived by the 
computer given the technology today. We assume that users will supply the description 
data for multimedia data in a natural language form. 
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3.2 Architecture 

In this section, we present the various components of a MDBMS that deal with multi- 
media data retrieval. This is the modified version of the architecture of a MDBMS dis- 
cussed in [LUM89]. Our proposed architecture enhances the performance of the matcher 
component and adds the capabilities of the user interface which are lacking in the archi- 
tecture proposed in [LUM89]. The components of the architecture are shown in Figure 2. 




Figure 2: Architecture of MDBMS Data Retrieval System 

As shown in Figure 2, the components break down into query processor, description 
manager, user interface, parser, generator and matcher. The query processor accepts 
queries from users and executes them by calling the other components. When a new de- 
scription for a multimedia data is entered, for example, the query processor calls the pars- 
er. The parser uses the dictionary to produce first-order predicates and return them to the 
query processor. The query processor then hands the predicates over to the description 
manager which then links the description to its multimedia data. 

When the query processor receives a query with text description, it calls the parser to 
obtain the equivalent query predicates. The predicates are then handed to the matcher. 



-7- 



The matcher tries to match the query with the qualified multimedia data by comparing the 
predicates of the query with that of the stored multimedia data. The matcher does this by 
calling the description manager and using domain knowledge. In addition, if an exact 
match is not possible, the matcher automatically switches to approximate match. To guide 
the matching process, the matcher also gets input from the user by calling the user inter- 
face. 

As the solution to a query, the query processor returns links to the qualified multimedia 
data. These links are handed to the generator which calls the description manager to get 
the predicates. It uses the dictionary to generate a sequence of natural language phrases, 
which it returns to query processor for output. 

The details of each component are discussed in [LUM89, HOLT90]. In addition, the 
query processor, description manager, parser, matcher and generator have already been 
implemented as part of the prototype MDBMS developed at the Naval Postgraduate 
School [MEYER88,LUM89,PEI90]. In this paper, we propose approximate matching pro- 
cess in the matcher as well as the user interaction technique in the user interface. 

3.3 Natural Language Description for Multimedia Data 

As mentioned, we propose to perform retrieval of multimedia data by matching the nat- 
ural language descriptions with the query specifications. We discarded the keyword 
search technique as a viable option because keywords are discrete and lack complex link- 
ing mechanisms to adequately capture the contents of multimedia data. In addition, it is 
not always possible to convey exact meanings using only keywords. 

We believe that unrestricted natural language processing is very difficult to achieve 
given the Al technology today. However, each multimedia application restricts the scope 
of the description of multimedia data in the particular application. Hence, instead of natural 
language description, we use captions to describe multimedia data. Captions are a natural 
but special, stylized way of writing descriptions with a subset of natural language. The de- 
tails of the captions and their restrictions for our objectives are beyond the scope of this 
paper and are given in [HOLT90, ROWE91]. 

Natural language descriptions have the advantage that everyone is familiar with it re- 
sulting in high acceptance rate. However this does not solve the problem of description 
understanding and the matching process as focusing on the specific application domain 
is also necessary. To handle this problem, a dictionary in which the users define the do- 
main of each application is provided thus restricting their vocabulary, the semantics and 
the knowledge of the system to apply. 

The parser translates the text description into a set of predicates. The imprecision and 
ambiguity of the natural language descriptions is reduced considerably by transforming 
them into a set of predicates. These predicates state facts about the real world entities 
involved with multimedia data like their properties and relationships. As in most parsing 
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methods, we chose the use of first-order predicate calculus as a formal representation of 
the description data. The parser depends on the dictionary to turn the descriptions into 
predicates. It is the parser’s task to use the dictionary to resolve synonyms and to check 
the syntactic context to resolve lexical ambiguities. 

Our parser also provides mechanisms to automatically partition a user query into the 
subject, verb and object components. This is essential in that, during data retrieval as we 
will see later, we can use the partitioned components to match against domain-dependent 
knowledge which also break down into subject, noun and object categories. The details 
of the parser and the predicates are beyond the scope of this paper and are given in 
[LUM89, HOLT90, DULLE90]. 

An example of natural language description and its translation into an equivalent set 
of predicates using the parser is shown below as follows: 

Description: “A car with red body 

Predicates: car(x), component(x,y), body(y), color(yred) 

Choosing the right set of predicates is a very difficult task which is comparable to 
knowledge acquisition for expert systems. For the purposes of this paper, it is sufficient to 
assume that the dictionary lists all the words the parser can recognize, all the parts of 
speech associated with any word, and the predicates to use when a word appears in the 
description. Thus, the set of all predicates that can be used in the descriptions must be 
defined in the dictionary. 

4. Matching 

In this chapter, we propose new ways of matching natural language descriptions of the 
multimedia data with the query specifications. The key to our matching process is the use 
of the domain knowledge represented using the notion of class hierarchy borrowed from 
the object-oriented field. Before we continue, we first discuss some specific problems 
found in our current matching capability that we eluded earlier. This will serve as the mo- 
tivation behind our new intelligent approach to approximate matching. 

4.1 Problems in Matching 

In our current system [LUM89,HOLT90], the result of parsing is one set of predicates 
per multimedia data instance. A query description is also entered in natural language and 
parsed. The arguments of the query predicates can be variables. A multimedia data is se- 
lected as the result of the query, if there exists a binding of query predicates to description 
predicates of multimedia data. The match of user query to multimedia data need not be 
exact. A set of rules, sometimes domain dependent, specifies situations in which sets of 
predicates that look different are really the same thing. 

The matching catches different natural language phrases with the same meaning, but 
not the semantic relationships among the predicates. For example, let us reconsider the 
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description, "a car with red body", of an image multimedia data. The predicates generated 
are “car(x), component(x,y), body(y), color(yred)”. For the sake of argument, we consider 
a query with the description, "a red car”. The query would be translated into something like 
”car(x), color(x,red)". There would be no match because the system does not know that 
the color of a car’s body is identical to the color of the car. 

To overcome this problem, rules can be introduced to express the semantic relation- 
ships among the predicates. In the above case, the rule introduced could be: 

If (car(X), component(X,Y),body(Y),color(Y.Z)) then color(X,Z)\ 

Using the above rule, color(x,red) can be deduced in the example above and there 
would be a match between the query and the description. A key unsolved problem, how- 
ever, is the question of which literals of the predicates to generalize to get a match, and 
how far to generalize. This falls into the category of approximate matching to a user query 
that we mentioned earlier in the paper. We believe that the answer lies in the use of do- 
main-dependent knowledge. 

If we are just interested in exact matching of a user query to the description of a mul- 
timedia data, our current matching technique [LUM89, HOLT90] would be quite adequate. 
However, a common problem lies in the fact that the user query is likely to result in an 
empty answer in which no exact matching to the description of stored multimedia data oc- 
curs. In this case, an efficient system will try to perform approximate matching whereby 
descriptions of multimedia data that satisfy some generalization of the user query are se- 
lected. Our objective, then, is to perform approximate matching to a user query efficiently. 
As mentioned earlier, our proposed approximate matching algorithm makes use of do- 
main-dependent knowledge to meet the objective. 

4.2 Domain-Dependent Knowledge 

Earlier, we justified the use of captions to describe multimedia data by stating that each 
multimedia application restricts the scope of the description of multimedia data. This 
means that the domain of discourse for the captions are limited for each multimedia ap- 
plication. Domain-dependent knowledge are key concepts in the domain of discourse of 
the captions. For our purposes, we only include concepts of nouns and verbs in the do- 
main-dependent knowledge. 

To represent domain-dependent knowledge, we chose the object-oriented data model 
[BANE87, KIM89, ZDON90]. The object-oriented model supports highly structured, com- 
plex objects and can capture naturally any mini-world entity. The data model has been 
used widely in such areas as CAD/CAM, VLSI, office automation, software engineering 
and Al. Our justification for using the object-oriented model to represent domain-depen- 
dent knowledge is as follows: First, it supports generalization and specialization abstrac- 
tion which permits conceptual generalization on the contents of the captions. Second, 
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researchers [WOEL87.HOLT90] have identified the use of object-oriented model in multi- 
media database applications as an appropriate and viable option. 

Without loss of generality, we will restrict our domain to the domain of the military his- 
tory of US forces in the Pacific during World War 2. The main reason is that we tested our 
current prototype MDBMS in military application based on the domain of the US military 
history. For our purposes, we will apply our approximate matching technique to the do- 
main of military history. However, we claim that our approximate matching technique can 
be applied to other multimedia applications. 

Figure 3 shows an example of the generalization hierarchy of a plane, a noun concept 
in our domain of discourse. It is the domain-dependent knowledge on planes that partici- 
pated in the Pacific during World War 2. We assume that the reader is familiar with object- 
oriented concepts such as object, class, inheritance along class hierarchy or lattice and 
methods. We also assume that the direction of the arrow in Figure 3 is from a class to its 
subclass. In Figure 3, the Plane class is specialized into classes Transport, Fighter, Bomb- 
er and Seaplane. Class Transport is specialized into class C-47 and class Fighter is spe- 
cialized into classes F6F-Hellcat, Corsair and Zero. In addition, class Bomber is 
specialized into class B-25 and class Divebomber which is further specialized into classes 
Zero, Dauntless and Stuka. 




Figure 3: Generalization Hierarchy of a Plane 
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The generalization hierarchy of a plane is a class lattice since class Zero has two su- 
perclasses, namely class Fighter and class Divebomber. In addition, properties of super- 
classes are inherited by all their subclasses along the superclass/subclass hierarchy but 
not vice versa. 

Figure 3 is one example of a domain-dependent knowledge corresponding to a noun 
(i.e. plane) concept in the domain of discourse. For our purposes, we can have domain- 
dependent knowledge for all noun and verb concepts in our domain of discourse. It is ob- 
vious that some of the noun and verb concepts may belong to the same class or general- 
ization hierarchy. Hence, generalization hierarchy need not be created for each and every 
noun or verb concept. 



4.3 Partial Matching Algorithm 

In this section, we will discuss our partial matching algorithm. For clarity, we will devise 
our partial matching algorithm by following through an example. Unless explicitly stated, 
we will refer to the example generalization hierarchy given in Figure 3. Before we go on, 
we next discuss what it is that we are interested in doing. 

Suppose that we have images of planes stored in the multimedia database and the 
images are described as transport planes. Let us now assume that a user gives a query 
asking for all planes which are C-47s. Even though there are no exact matching, we 
should retrieve all transport planes stored because any C-47 is a transport plane accord- 
ing to the domain-dependent knowledge. Now, if the user asks for all fighter planes, we 
cannot simply retrieve all transport planes because they may not be what the user wants. 
However, a user asking for planes would more likely retrieve the stored transport planes 
than if he was to ask for fighter planes because a transport plane is still a plane but is not 
a fighter plane. 

The goal of our algorithm is also to minimize the influence of the definition of the hier- 
archy which is dependent on the designer. The generalization hierarchy designer might 
have a view of the domain dependent knowledge which may not be consistent with the 
view of other people. This phenomenon might bias some specific branch of the generali- 
zation hierarchy over other branches during partial matching. 

An efficient partial matching algorithm has to deal with all the problems such as the 
ones addressed above and come up with a general solution. We solve these problems by 
using heuristics to assign a weight ranking system given a generalization hierarchy(ies). 
Our major objective is to come up with a weight ranking scheme that is both fair and ac- 
curate which can be used to determine whether stored multimedia data should be re- 
trieved given a user description. 
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4.3.1 Weight Ranking Scheme within a Generalization Hierarchy 

In this section, we will discuss the weight ranking strategy used by our partial matching 
algorithm given a single generalization hierarchy. The weight ranking strategy used for a 
group of generalization hierarchies will be discussed in the subsequent section. The 
weight ranking strategy used on a generalization hierarchy is a consequence of the se- 
mantics of the class hierarchy (lattice) or the IS-A hierarchy concept supported in an ob- 
ject-oriented data model. 

Given a class C in a generalization (class) hierarchy for a noun or a verb concept, and 
assuming that a class, other than C, with a rank of positive weight is a specialization of 
class C while one with a rank of negative weight is a generalization of C, we can introduce 
the following two general heuristics; 

Heuristics 1 : All direct (indirect) subclasses of C have positive weights. 

Heuristics 2 : All direct (indirect) superclasses of C have negative weights. 

Heuristic 1 says that given a class C specified in a user query, all subclasses of C in 
the class hierarchy to which C belongs are specializations of the class and more weights 
(positive) are given. Heuristics 2 says that given a class C specified in a user query, all 
superclasses of C in the class hierarchy to which C belongs are generalization of the class 
and less weights (negative) are given. This reasoning follows directly from the definition 
of a class (IS-A) hierarchy and relationships among classes along the class hierarchy in 
the context of an object-oriented data model. 

The assignment of negative weights to generalization is intuitively clear. The assign- 
ment of positive weights to specialization is based on the fact that specialization inherits 
all properties of the parent nodes in addition to having its own additional information. 
Hence, we feel that positive or more weights should be assigned to the nodes in the paths 
towards specialization hierarchy. 

Given the heuristics, it is easy to see that all classes in the class hierarchy which have 
ranks of positive weights relative to the class C, which is specified in the user query as 
either a noun or a verb concept, are selected during approximate matching. This is be- 
cause all classes with ranks of positive weights are subclasses (specialized classes) of 
the class C, specified by the user query. Since each of the classes is a specialized version 
of class C, it encompasses properties of class C and indeed is class C. 

On the other hand, all classes in the class hierarchy which have ranks of negative 
weights relative to the class C which is specified by the user query should be restrictively 
selected depending on the weights. This is because all classes with ranks of negative 
weights are superclasses (generalized classes) of the specified class C along the class 
hierarchy. Since each of the classes is a generalized version of class C, it does not en- 
compass all properties of class C and is not class C. The question of which classes to se- 
lect depends on getting information from the user on how far to generalize. 
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The weight ranking system we introduced so far is vague and is not well defined. What 
is defined is that given a class C in a class hierarchy of interest, any class belonging to 
the same class hierarchy which is assigned a positive weight is always selected during 
approximate matching. On the other hand, a class in the same class hierarchy which is 
assigned a negative weight is only selected during approximate matching if it exceeds a 
threshold given by the user. We now discuss the assignment of weights for different class- 
es in the class hierarchy of interest. 




There are three different situations in which weights can be assigned to classes in a 
class hierarchy. The different situations are shown in Figure 4. Suppose that the class 
specified in a user query is class C. As before, we assume that the direction of the arrow 
is from a class to its subclass. For example, in Figure4 (a), class C is a superclass of class 
X and class X is a subclass of class C. The first situation, shown in Figure 4(a), is to assign 
weight to a class (X or Y) which is a subclass of class C. The second situation, shown in 
Figure 4(b), is to assign weight to a class (X or Y) which is a superclass of class C. The 
third situation, shown in Figure 4(c), is to assign weight to a class (Y) which is a subclass 
of a superclass (i.e. X) of class C. 

The principles behind our weight ranking system are quite simple. We assume that all 
classes with positive weights and some classes with negative weights that exceed a 
threshold value are selected during approximate matching. First, we assign a weight of 0 
to the class C specified in the user query. Class C is the reference point to all other classes 
in the class hierarchy during approximate matching. For classes which are subclasses of 
class C, we assign positive weights because they are specialized version of class C. Spe- 
cialized versions of class C have more specific and definite information than C itself and 
hence are assigned positive weights instead of 0. For our purposes, all subclasses of C 
are assigned the same positive weight. 

For classes which are superclasses or subclasses of superclasses of class C, we as- 
sign negative weights because they are generalized version of class C. Generalized ver- 
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sions of class C have less and more general information than C itself and hence are 
assigned negative weights. Different generalization versions have different negative 
weights. However, in assigning negative weights, we have to minimize the influence of the 
definition of the model. It is true that the further away a class is from class C in the class 
hierarchy, the more negative weight is assigned to the class. 

In most systems, the assignment of weight of a class is linearly inverse proportional to 
the depth level of the class relative to the level of the class C specified in the user query. 
We believe that this is not the correct approach because the relative distance of a partic- 
ular class to the class of interest, in this case class C, with respect to other classes is not 
the absolute but some artificial distance caused by a particular designer’s view of the do- 
main knowledge. The main problem with this approach is that some classes belonging to 
some lengthy branch could be unfairly disqualified because of higher negative weights. 
Our weight ranking system tries to minimize the bias against some lengthy branch of a 
class hierarchy over other shorter branches. 

Given that class C is the class specified by user query as shown in Figure 4, the as- 
signment formulas of weights for classes in a class hierarchy according to the three dif- 
ferent situations mentioned are as follows. 

(1) Class specified by user query (i.e. class C in Figure 4) 
weight = 0 

(2) Subclass of class C (i.e. class X or Y in Figure 4(a)) 
weight = a, where a is a integer constant 

(3) Superclass of class C (i.e. class X or Y in Figure 4(b)) 



where a, p are integer constants and n is level # of superclass relative to class C 

(4) Subclass of a superclass of class C (i.e. class Y in Figure 4(c)) 



where a,p,\),yare integer constants; h is level # of superclass relative to class C 
and / is level # of subclass relative to superclass 
In our scheme, a class which is assigned a positive weight is always selected during 
partial matching. A class with a negative weight can be selected provided that it does not 
exceed a threshold value set by the user. To understand the weight assignments for dif- 
ferent classes, we next give some examples using the class hierarchy of Figure 3. Given 
a user query, if the image corresponding to the user description is not found in the data- 
base, the system then automatically proceeds with approximate matching. Using the 



weight = - 
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weight assignment formulas and given some user query descriptions, the weights for 
some of the classes in the class hierarchy are as follows. For the sake of argument, we 
assume that the values of a, p, yand m are 40, 2, 48 and 2 respectively. 

(1) “A transport plane sank in the Pacific” 

Transport = 0, C-47 = 40, Plane = -20, Fighter = -44, Corsair = -56 

(2) “A F6F-Hellcat sank in the Pacific” 

F6F-Hellcat = 0, Plane = -30, Seaplane = -54, Stuka = -72, C-47 = -66 

(3) “A bomber sank in the Pacific” 

Bomber = 0, Stuka = 40, B-52 = 40, Plane = -20, Seaplane = -44, C-47 = -56 

In the examples shown, all classes which are assigned positive weights are selected 
during partial matching. In example (1), the class C-47 has a positive weight of 40. This 
means that the image whose description is “A C-47 sank in the Pacific” is selected dur- 
ing partial matching. As shown in the examples, all classes which are subclasses of the 
class which is specified in a user query are assigned positive weights. All classes which 
are superclasses or subclasses of the superclasses of the class which is specified in a 
user query are assigned negative weights. For these classes, the weight of a class is in- 
versely proportional to the depth level of the class relative to the level of the class specified 
in the user query along the class hierarchy although they are not strictly linear. In example 
(2), the class Seaplane has a negative weight of -54. This means that the image whose 
description is “A Seaplane sank in the Pacific” has a weight of - 54. Class Stuka has a 
negative weight of -72 and class Stuka is further away from F6F-Hellcat than class Sea- 
plane is from F6F-Hellcat. 

Suppose the weight of a class is linearly but inversely proportional to the depth level 
of the class relative to the level of the class C specified in the user query. If we assign a 
negative constant weight, say -10, for each level away from class C, the class which is 5 
levels away from class C will have a negative weight of -50 compared to a negative value 
of -20 for a class which is 2 levels away from class C. For example, if the user query is “A 
transport sank in the Pacific”, the weight of class Transport is 0, class Seaplane is -20 
and class Stuka is -40. Using our formulas, the same user query will assign weights of 
classes Transport, Seaplane and Stuka to be 0, -44 and -62 respectively. The weight of 
class Stuka is more biased against relative to the weight of class Seaplane using the lin- 
ear method over our method. 

It is very difficult to quantify how much closer class Seaplane is to class Transport over 
class Stuka to class Transport as both Seaplane and Stuka are types of planes. Our for- 
mulas are designed to minimize bias as best as possible. A user is more likely select a 
threshold value such that class Stuka is less likely selected during approximate matching 
over class Seaplane using the linear method compared to using our dynamic method. An- 
other difficult task is to set the value of the constants to be applied in our assignment for- 



- 16- 



mulas as well as the threshold value. The user must choose the correct values for the 
constants and the threshold value depending on the number of objects that qualify during 
approximate matching. Hence, it is necessary for the system to interact with the user 
through the user interface throughout the matching process. 

4.3.2 Weight Ranking Scheme for a Group of Generalization Hierarchies 

In the previous section, we discussed the ranking of weights for classes belonging to 
the same generalization hierarchy. In this section, we extend the ranking of weights for 
classes belonging to different generalization hierarchies. In our scheme, the ranking of 
weights for classes in different class hierarchies is influenced by the following rules. 

Rule 1 : For each local class hierarchy, the weight ranking system discussed in Sec- 
tion 4.3. 1 is applied. 

Rule 2: For different class hierarchies, the user determines the priority order of class 
hierarchies. 

Using rule 1 , for each class hierarchy selected by the user query, the classes within 
the class hierarchy are assigned weights using the weight ranking system discussed in 
Section 4.3.1 and is a straightforward process. Hence, regardless of the number of class 
hierarchies involved, all classes belonging to these hierarchies can be assigned weights 
for partial matching. It is easy to see that rule 1 does not cause any problems because 
there is no interrelationship between classes of different class hierarchies during weight 
assignments. 




Figure 5: Combination of two Class Hierarchies 
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The global ranking of weights for different class hierarchies is a problem because the 
weights assigned within each class hierarchy now has to be considered with respect to 
weights assigned for other class hierarchies and they have to be meaningful globally. Rule 
2 is to determine the priority order of importance of the class hierarchies selected from a 
user through the user interface. Different class hierarchies can be assigned different 
weights according to the priority order of importance. 

Figure 5 shows a combinations of classes belonging to two different class hierarchies. 
For our purposes, we now consider a user query description “C and D” involving classes 
C and D belonging to different class hierarchies in Figure 5. There are three generic partial 
matching combination types and they are given as follows. The example classes in the 
combination types are taken from Figure 5. CHI is the name of the class hierarchy on the 
left and CH2 is the name of the class hierarchy on the right in Figure 5. 

Type 1 : C of CHI , and any class in CH2 except D. 

Type 2 : Any class in CHI except C, and D of CH2. 

Type 3 : Any class in CHI except C, and any class in CH2 except D 

The ranking of weights for type 1 and type 2 combinations are easy to handle by using 
the previously discussed weight ranking system. This is because we only need to assign 
weights to classes in one of the class hierarchies but not both. However, handling type 3 
combination requires a closer attention because it requires assigning weights to classes 
belonging to different class hierarchies. To assign weights in this case, we determine the 
priority order of CHI and CH2 through feedback from the user. Through the user interface, 
we get information on which class hierarchy has a higher priority. We then assign different 
weights for CHI and CH2 depending on the priority order. 




Figure 6: Generalization Hierarchy of a Place 
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This can be expressed using the following weight formula. The constant values of co 
and 5 has to be determined by the user through the user interface. 

(1) Weight (Type 1) = Weight (CH2) 

(2) Weight (Type 2) = Weight (CH1) 

(3) Weight (Type 3) = co (Weight (CH1)) + 5 (Weight (CH2)) 

Figure 6 is a generalization (class) hierarchy of the noun Place concept. Using Figure 
3 and Figure 6, given a user query description “A C-47 sank in the Ocean”, an example 
of a type 1 combination is a multimedia data with a description “A C-47 ianded in an Is- 
iand". A type 2 combination is a multimedia data with a description of “A Stuka sank in 
the Ocean". Finally, a type 3 combination is a multimedia data with a description “A 
Transport ianded in Sea”. 

In this section, we discussed the assignment of weights for classes involving two dif- 
ferent class hierarchies. For a practical system, the number of class hierarchies involved 
for weight assignment is obviously large since many noun and verb concepts are involved. 
It is not difficult to see that our weight ranking scheme discussed in this section can be 
easily extended to assign weights for classes involving many class hierarchies. The main 
problem lies in how good the user interface is in getting the information from the user. Ob- 
viously, the weight ranking system has to be dynamic, since all constant values assigned 
by the user can change depending on the number of qualified multimedia data selected 
during partial matching. The user also has to determine the threshold value such that not 
too many multimedia data are selected from the database. 

4.3.3 Application of Weighting Algorithm 

The application of the weighting algorithm just presented requires a parser to under- 
stand the natural language specifications in the multimedia data descriptions and the user 
queries. As stated earlier, the descriptions are parsed and stored in the system as predi- 
cates. The queries are processed as follows. 

When a query is received from the user, the parser separates the natural language 
specification into smaller component groups, namely subject noun, verb and object noun 
phrases. Each of these will actually become predicates. When these predicates match ex- 
actly with the predicates in the descriptions of certain multimedia data, those multimedia 
data will be retrieved. However, there may be other descriptions of multimedia data that 
are actually of interest to users but those descriptions are not stated as logically implied 
by the query. This latter category is expected to be the usual case rather than the former 
for reasons stated earlier. 

To find the latter, we suggest that system search in the noun and verb generalization 
hierarchies of the object classes and av'isign weights to the descriptions as given in the 
weight assignment algorithm, assigning the appropriate weighting factors (co and 5 in the 
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previous section) as received from the user. These multimedia data with combined weight 
exceeding the threshold value set by the user will then be retrieved. 

The separation of the natural language query can be in smaller components than the 
three groups just stated. For example, a complex noun phrase may be separated into a 
number of small noun groups and the weighting algorithm applied to these groups to ob- 
tain a combined weight. For example, “the man with a mustache" can become two class- 
es, namely man and mustache. Naturally, the finer the granularity of the separation, the 
larger and the more complex the processing is needed. 

5. Summary 

A major problem in retrieving multimedia data such as a sound, or an image stored in 
a database is that they are intrinsically rich in semantics and conventional search methods 
used in databases and information retrieval systems may not work or are of little use. Most 
research on intelligent IR systems are concerned with natural language processing and 
deductive capabilities based on extended semantic model of document content and also 
from the user. However, most of them deal with exact matching or primitive partial match- 
ing using simple linear methods. 

In this paper, we discussed some of the fundamental problems faced by a MDBMS 
during data retrieval and outlined an architecture of a IR system for multimedia databases. 
The main contribution of our paper is the formulation of a partial matching algorithm that 
uses domain knowledge, represented using an object-oriented data model, and weight 
ranking system to assign weights to different multimedia data stored in a database and 
selects those multimedia data that partially matches a given user query description. Our 
dynamic matching scheme also interacts with the user to get further information regarding 
the soundness and correctness of the partial match. 

Our parser provides mechanisms to automatically partition a user query into the sub- 
ject noun, verb and object noun components. This is essential in that, during data retriev- 
al, we used the partitioned components to match against generalization hierarchies of 
domain-dependent knowledge which also deals with noun and verb categories. Further 
research is necessary to improve the parser to also automatically derive adjectives and 
other caption components for complete understanding and processing of captions in the 
context of partial matching. 

We believe that our approach is both simple and elegant. The simplicity lies in exploit- 
ing the semantics of generalization and specialization abstraction of the object-oriented 
model. We also believe that our approach is a general one that can be readily applied to 
applications in IR and Al. However, in this paper we left out details of the user interface 
component of our retrieval system. We believe that this is an area of research interest and 
we are currently investigating different methods. 
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